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Abstract. Making a Prolog program more efficient by transforming its 
source code, without changing its operational semantics, is not an ob- 
vious task. It requires the user to have a clear understanding of how 
the Prolog compiler works, and in particular, of the effects of 'impure' 
features like the cut. The way a Prolog code is written - e.g., the order 
of clauses, the order of literals in a clause, the use of cuts or negations 
- influences its efficiency. Furthermore, different optimisation techniques 
may be redundant or conflicting when they are applied together, depend- 
ing on the way a procedure is called - e.g., inserting cuts and enabling 
indexing. We present an optimiser, based on abstract interpretation, that 
automatically performs safe code transformations of Prolog procedures 
in the context of some class of input calls. The method is more effective 
if procedures are annotated with additional information about modes, 
types, sharing, number of solutions and the like. Thus the approach is 
similar to Mercury. It applies to any Prolog program, however. 
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1 Introduction 

Programming in Prolog allows us to write so-called multidirectional procedures, 
in the sense that the same code of a procedure can be used in more than one 
way (the arguments being either input data or output results). The undesirable 
consequence for multidirectionality is inefficiency (in terms of space utilisation 
and of execution time). This efficiency issue comes from the general execution 
model of Prolog, which is generally based on the Warren's Abstract Machine 
(WAM, for short) [1,14]. Answer substitutions are computed according to a 
depth-first search strategy with backtracking, where clauses are executed from 
top-to-bottom, and literals are executed from left-to-right inside a clause. Given 
the incompleteness of Prolog, some input-output patterns can loop. Also, Prolog 
uses a general algorithm for unification, with no restriction on the terms be- 
ing unified, but most compilers do not perform the occur-check test during the 
unification. For efficiency reasons, some built-in procedures are not multidirec- 
tional (for example, arithmetic and comparison predicates). Negation as failure 
is sound only if it applies to a ground literal. Due to the incompleteness and 



unsoundness of Prolog, not every ordering of clauses and literals is operationally 
correct, and the way a procedure is written greatly influences the search of its 
solutions, and then, the efficiency. 

A solution for optimising multidirectional procedures is to generate spe- 
cialised code for each particular use of the procedure. In the context of a di- 
rectionality, one can try to find a more efficient ordering of clauses and literals, 
such that the program still remains operationally correct for that directional- 
ity. In Prolog, we can also try to insert cuts to prune the search tree without 
removing solutions. This can greatly reduce the size of the search tree, and im- 
proves the efficiency. Applying correct code transformations is not obvious and 
is tricky to be done manually, because it is very error-prone. This paper de- 
scribes an optimiser based on abstract interpretation, which realizes this task 
automatically. 

To illustrate the interest of specialising code, consider the mutidircctional 
procedure ef f ace(X,T,TEf f ), which is the running example of [6]: X is an ele- 
ment of list T, and TEf f is the list T without the first occurrence of X in T. 

efface (X, [H I T] , [HlTEff] ) :- ef face (X,T, TEf f) , not(X=H). 
efface (X, [X I T] ,T) . 

This code can be used in several ways: either when every input argument is a 
ground term; or when inputs X and T are ground and TEf f is a variable; or when 
inputs X and TEf f are ground and X is a variable; or when inputs X and TEf f are 
variables and T is ground; etc. Now, if we consider only the first directionality, 
then our optimiser will be able to automatically generate the following specialised 
code (the optimiser has checked that the procedure is deterministic for this 
directionality): 

efface(X, [X I T] ,T) :- ! . 

eff ace (X, [HIT] , [HlTEff] ) :- ef face (X,T, TEf f) . 

The clauses are reordered, a cut has been inserted in the first clause, and the 
negation is removed. For all inputs satisfying the first directionality, the se- 
quences of answer substitutions of the specialised and of the multidirectional 
codes are identical. Table 1 compares the execution between the multidirec- 
tional and the specialised codes. Several tests have been performed by varying 
the list-length of the input list T. The table shows that the multidirectional code 
is less efficient than the specialised one in terms of execution time and of used 
local stack. In particular, the specialised code uses a constant amount of local 
stack (independently of the size of the input), while we yield a local stack error if 
we try to execute the multidirectional code with an input list of size 25000. The 
speedup increases according to the size of the input: for instance, the optimised 
code spent 3.61 times less execution time for an input list of size 10000. 

Our approach is strongly inspired by [6] , where a methodology to build cor- 
rect programs is proposed: starting from a specification and a so-called logic 
description of the problem, the methodology constructs operationally correct 
programs which are not written in the usual style of experienced Prolog pro- 
grammers: procedures are normalised, with explicit unifications, and are thus 
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Table 1. Program efface(X,T,TEFf ) executed 1000 times, when X is a ground term, T 
is a ground list, and TEf f is a variable. The program has been tested on a 1.5 Ghz Pen- 
tium; 1Gb RAM; Linux Suse; SWI-Prolog v 5.4.6 [15]. The size to which the local stack 
is allowed to grow is 2048000 By. 

inefficient. The author of [6] then proposes to apply some code transformations, 
in order to produce more efficient programs (written in the usual style and where 
cuts are introduced). It is not obvious to ensure the correct application of trans- 
formations, nor to choose the best order to apply them. Our optimiser does not 
require or assume that Prolog programs are written in a specialised syntax. It 
accepts any kind of Prolog programs. So, the programmer has the liberty to write 
its program in the style he wants: normalised or not. Our optimiser then auto- 
matically specialises the program for some directionality, by choosing a suitable 
order for applying the transformations, and by ensuring that the transformations 
are correctly applicable. 

The transformations performed by the system are related to partial eval- 
uation: literals are evaluated at compile-time, such that some of them can be 
removed, and some unifications can be simplified. However, we do not unfold 
procedure calls as it is normally done in partial deduction. 

The optimiser is based on an abstract interpretation framework [3,10,11], 
that collects and verifies the semantic information needed for correct applica- 
tion of source-to-source transformations. The operational properties catched and 
verified by the framework that are useful for the purpose of the optimiser are: 
cardinality information (including determinacy and conditions for sure success 
and failure), detection of the exclusivity between clauses, information about the 
mode, type, sharing, linearity, and size of input/output terms, occur-check free- 
ness, and induction parameters for proving termination. 

The rest of the paper is organised as follows. Section 2 presents the source- 
to-source transformations we apply on Prolog programs, and specifies the con- 
ditions for applying correctly such transformations. Section 3 describes briefly 
the abstract interpretation framework. Formal specifications are introduced, to 
allow the user to express the directionality for which the code must be optimised. 
Section 4 illustrates in which order the transformations are applied. Section 5 
reports on experimental results, and Section 6 presents the related work. 

2 Source-to-source Transformations 

Optimisation is carried out by means of transformations based on the opera- 
tional semantics of Prolog. Section 2.1 describes the Prolog programs accepted 
by the analyser, and the syntax of normalised programs, on which most code 



transformations are applied. Section 2.2 introduces the concept of sequence of 
answer substitutions, that allows us to describe the sufficient conditions for ap- 
plying correct source-to-source transformations. Section 2.3 defines some seman- 
tic properties that characterise sequences of answer substitutions. Such proper- 
ties are undecidable, but can be safely approximated by our framework. Finally, 
Section 2.4 presents the transformation rules. 

2.1 Prolog Syntax 

The optimiscr accepts any Prolog program that is ISO conformant, with special 
features like the cut and the negation, as well as constructs of the form ; (dis- 
junctions) and ->; (if then else). The abstract interpretation framework of the 
optimiser is designed on normalised programs, and most code transformations 
apply on such programs. A normalised procedure pr is a nonempty sequence 
of clauses c. Each normalised clause c has the form h : -g where the head is of 
the form p{X\, X n ), whereas the body g is a possibly empty sequence of nor- 
malised literals. A normalised literal is cither a built-in of the form X. lx = Xi 2 , 
a built-in of the form X il — f(X i2 , ...,X in ), a procedure call p(X il , ...,X in ), a 
cut !, or a negation not(l), where I is a normalised literal. The variables occurring 
in a literal arc all distinct; all clauses of a procedure have exactly the same head; 
if a clause uses m different variables, these variables are Xi,...,X m . Observe 
that all Prolog program can be rewritten into equivalent normalised programs. 
In the rest of this paper, P denotes the given normalised program. 

2.2 Sequence of Answer Substitutions 

Semantically, a normalised procedure pr can be viewed as a function mapping 
every input substitution 8 to a sequence of answer substitutions S. A substitu- 
tion 8 is a finite set of the form {X\/ti, X n /t n } where Xi, X n are distinct 
program variables, and where ti,...,t n are terms (variables occurring in terms are 
standard variables; the sets of standard and program variables are disjoint). A 
sequence of answer substitutions S can be either finite < 8\, 6k > (k > 0), or 
infinite < 0\, 8k, ... > (fc 6 N), or incomplete < 6\, ...,8k, 1- > (k > 0), where 
the symbol _L denotes that the procedure loops. To express this behaviour, we 
use the notation of [11]: (8,pr) h^p S for a procedure, (8, c) h^ p S for a clause, 
and (8,g,c) h^ p S for a prefix of a clause. In the rest of this paper, we as- 
sume that each procedure terminates. Thus, we only consider finite sequences 
of answer substitutions. Our optimiser is based on the abstract interpretation 
framework defined in [10], that uses induction parameters to verify that a proce- 
dure terminates. Notice that every procedure of our benchmarks terminates. The 
substitutions of S are denoted by Subst(S), and the length of S is denoted by |S|. 

2.3 Semantic Properties 

This section defines some semantic properties that characterise the execution 
of literals, prefixes of the body of a clause, clauses, procedures, in terms of the 
length of their sequence of answer substitutions. Such properties are useful to 
express the sufficient conditions to apply safely the code transformations. 



Let pr be a procedure of arity n, let c be a clause of the procedure pr, let 
(g, I) be a prefix of the clause c, where g is a goal and I is a literal. Let 9 be an 
input program substitution with domain {X\, ...,X n }. Consider the execution 
of the procedure (9,pr) S pr , the execution of the clause (0,c) ^p S c , the 
execution of the prefix of the clause (0,g,c) ^->p S g . Assume that the literal / 
is of the form q{X il , ...,X ir ). The execution of the literal I after the execution 
of the goal g can be described for all 6' G Subst(S g ) by {fig/, I) ^p Sg', where 
Xk"&e> — X ik 9' (1 < k < r). We can now define the following properties (the 
terminology of [6] is used): 

— The procedure pr, or the clause c, or the prefix of a clause g, or the lit- 
eral after the execution of the goal g is deterministic w.r.t. 8 iff their 
sequence of answer substitutions has at most one computed answer substi- 
tution: |5 pr |, |5 C |, |5 9 |, \S e > \ G {0, 1}, for all 6' G Subst(S g ). 

— The procedure pr, or the clause c, or the prefix of a clause g, or the literal 
after the execution of the goal g is fully deterministic w.r.t. 9 iff their 
sequence of answer substitutions has one and only one answer substitution: 
\Spr\ = \S C \ = \S g \ = \S e >\ = 1, for all 8' G Subst(S g ). 

— The procedure pr, or the clause c, or the prefix of a clause g, or the literal 
after the execution of the goal g surely succeeds w.r.t. 9 iff their sequence 
of answer substitutions has at least one answer substitution: \S pr \, \S C \, \S g \, 
\Sy\ > 1, for all 9' e Subst(S g ). 

— The literal I is a test literal after the execution of the goal g w.r.t. 9 if it 
is not a cut, and it is deterministic w.r.t. 9, and it does not instantiate any 
variable. For all 9' G Subst(S g ), S' g is either the empty sequence <> or the 
sequence < dgi >. 

— The two procedures pr\ and pr2 (with the same arity n) are exclusive 
w.r.t. 9 iff either the execution of pn fails, or the execution of pri fails, or 
both executions of pr\ and pr 2 fail: 



The same definition applies for exclusivity between two clauses of same arity, 
or between two prefixes of clauses of same arity. 

The above properties are undecidable but can be safely approximated by our 
abstract interpretation framework presented in Section 3. 

2.4 Transformations Rules 

The subsequent transformation rules are adapted from [6]. Let pr be a nor- 
malised procedures whose arity is n, and let 9 be a program substitution whose 
domain is {X\, X n }. A rule transforming the procedure pr of the program P 
into the procedure pr' is correct w.r.t. 9 if the executions of pr (in the context 
of the program P) and of pr' (in the context of the program P' , which is the 
program P where pr has been replaced by pr') produce the same sequence of 
answer substitutions. In other words, if (9,pr) t-^p S and (9,pr') \— >p/ S' then 




S = S'. The conditions given for applying the transformations are expressed us- 
ing the semantic properties defined in Section 2.3. Such conditions are sufficient 
but not always necessary. It is assumed that the procedure pr has no side-effects 
when it is executed with input 9. 
Rule 1: Reorder clauses 

c t :p{X u ...,X n ) :- g t . 
c-j :p{X 1 ,...,X n ) :- gj. 



Cj : p{X u ...,X n ) :- g r 
d : p(Xi, ...,X n ) :- gi. 



where: 

— for all k G {i, j}, the clause Ck is deterministic w.r.t. 6; 

— for all k, I G {i, : k ^ I, we have that Cfc and c/ are exclusive w.r.t. 6; 

— for all k G {i, j}, there is no cut in g^. 

Rule 2: Insert green cuts 
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where: 

— the goal (Zi, ...,Zi) is deterministic w.r.t. 0; 

— for all z G {k + 1, r}, the goal (Zi, ...,k) and clause c z are exclusive w.r.t. 9. 

Rule 3: Eliminate dead code 

ci : p{X u ...,X n ) :- gx. 

Ck : p(Xi, JT„) :- h, ...,k, !, Zj+i, l s . 

c r : pjXu ...,X n ) :- g r . 

ci : p(X 1} ...,X n ) :- gi. 

Ck ■ p(X\, X n ) : - li, li, !, l s . 

where: 

— the goal (Zi, ...,Zi) surely succeeds w.r.t. 9. 



Rule 4: Move backwards cut 

c/c : p{X\, X n ) :- h, Z»_i, !, Z,, Z s . 



Cfc : -X'n) '• - h,—, k-i,k, !,-■■, is- 

where: 

— Zj is fully deterministic after the execution of goal (Zi, ...,Zj_i) w.r.t. 0. 
Rule 5: Remove useless test literals 

ci : p{X\, ...,X n ) :- h, k-i,k, Z»+i, Z s . 

Cfc : p(-Xi, ■■■,-^n) : ~ h, k-i, k, k+i, Z s - 

c r :p(Xi,...,X n ) :-g r . 

ci : :- h, h-i 7 k, h+i, ■■-,l s - 

Ck : X n ) : - Zi, Z»_i, h+i, l s - 

c r :p(Xi,...,X n ) :- g r . 

where: 

— li is a test literal after the execution of goal (Zi, Zj_i) w.r.t. 0; 

— Zi is deterministic after the execution of goal (Zi, ...,Zj_i) w.r.t. 9; 

— U is fully deterministic after the execution of goal (Zi, Zj_i) w.r.t. 6 
or 

3z e {1, k — 1} such that a cut is surely executed in clause c z w.r.t 9. 

The abstract interpretation framework that safely approximates the condi- 
tions for applying the above source-to-source transformations is discussed in the 
next section. 

3 Abstract Interpretation Framework 

This section presents the abstract interpretation framework [7, 10] that captures 
and checks the semantic properties (and other ones) defined in Section 2.3. This 
semantic information (e.g., about the determinacy, the exclusivity) is needed for 
ensuring correct application of the code transformations. Section 3.1 describes 
the two fundamental abstract domains. An abstract substitution represents a set 
of program substitutions, and an abstract sequence represents a set of sequences 
of answer substitutions. Section 3.2 illustrates the syntax of formal specifications, 
that allows the user to write abstract sequences into a convenient syntax. The 
abstract execution is briefly presented in Section 3.3. Finally, Section 3.4 shows 
how abstract domains are used to check the semantic properties. 



3.1 Abstract Substitutions and Abstract Sequences 

The domain of abstract substitutions is an instantiation to modes, types, lin- 
earity and possible sharing of the generic abstract domain Pat (3?) described 
in [4,9]. An abstract substitution represents a set of program substitutions of 
the form {X\/t\, ...,X n /t n }, where the X^s are program variables, and the U's 
are terms (variables occurring in terms are standard variables; the set of stan- 
dard and program variables are disjoint). The set of substitutions represented 
by (3 is denoted by Cc([3). Formally, an abstract substitution (3 is a triple of 
the form {sv,frm, a). The same-value (sv) and frame (frm) components provide 
information about the structure of terms. Each term described in (3 is repre- 
sented by an index. The sv component maps each program variable X to its 
corresponding index. Hence, the equality sv(X) = sv(Y) means that variables 
X and Y are bound to the same term. The frm component describes the pattern 
of some indices, by giving their functor name and the indices of their compos- 
ing subterms. The alpha tuple a is the generic part of the domain. It provides 
extra information about all terms and subterms of interest (represented by the 
indices). In the current analyser, a is of the form (mo,ty,ps,lin, E). The mo 
component [9] maps each index to its mode (e.g., gr (ground), var (variable)). 
The ty component maps each index to its so-called type expression [7] (e.g., 
list(int) denotes the set of all lists of integers, list (any) denotes the set of 
all possibly non-instantiated lists). The ps component [9] is a binary relation 
over indices, and expresses the possible sharing between two terms. Pairs of in- 
dices that do not belong to ps surely do not share a variable. The lin component 
contains all indices that are surely linear (i.e., they do not contain several oc- 
currences of the same variable). The E component is a set of linear relations 
between the size of terms (several norms can be combined). The abstract sub- 
stitution whose concretisation is the empty set is denoted by _L. The greatest 
lower bound between two abstract substitutions (3\ and fii is denoted by (3\ l~l (3i 
and is such that Cc(0i) n Cc{(3 2 ) = Cc{f3 1 Ufa). 

The domain of abstract sequences models the operational behaviour of a Pro- 
log procedure. An abstract sequence B describes a set of pairs (6, S) where 9 is 
a program substitution and S is the sequence of answer substitutions resulting 
from executing a procedure (a clause, a goal, etc.) with input substitution 9. An 
abstract sequence B is a tuple of the form (/3 in , [3 ref , Pf a u s ,U, [3 ouU E ref _ out , E sol ) 
that imposes conditions on the pairs (9, S). The set of pairs (6, S) satisfying the 
conditions imposed by B are denoted by Cc(B). The input abstract substitu- 
tion (3i n describes the class of accepted input calls: 9 £ Cc(/3j n ). The refined 
abstract substitution [3 re f describes the successful input calls, i.e., those that 
produce at least one solution: S ^ <> implies 9 £ Cc(f3 re f). The set of abstract 
substitutions f3f a u s describes conditions of sure failure: the sequence S is empty 
if the input substitution satisfies one of the abstract substitution of /?/ « s , i.e., if 
there exists /3/ £ f3f a u s such that 9 6 Cc((3f) then S = <>. The untouched com- 
ponent U describes the set of input terms that are untouched (non-instantiated) 
during the execution. The output abstract substitution (3 ou t describes the sub- 
stitutions belonging to S: for each & in Subst(S), we have that 9' £ Cc{(3 out ). 



The size relations component E re f_ out is a set of linear relations (equations and 
inequations) between the size of the input/output terms. The cardinality com- 
ponent E so i is a set of linear relations between the number of solutions and the 
size of the input terms, i.e., sol=|5| is a solution of the system E so i. 

3.2 Formal Specifications 

Formal specifications describe abstract sequences using a concrete syntax more 
convenient for a programmer than the mathematical formalism of abstract se- 
quences. A formal specification may contain other information needed for the 
analyser, like the induction parameter for proving termination. For instance, the 
following two specifications for the efface procedure can be written by the user 
(and can be checked by the analyser): 

efface efface 



We find the different parts of an abstract sequence. The first specification con- 
siders the situation where input X is a ground term, input T is a ground list, and 
TEf f is a variable. After success of execution, TEf f becomes a ground list, whose 
list-length is the list-length of input T minus one. The symbol '_' is used when 
we do not provide refined information about an argument. The execution termi- 
nates (the size expression T decreases through recursive calls) and is deterministic 
(sol=<l). In the second specification, input X is a ground term, T is any term, 
and TEf f is a ground list. This execution terminates and is non-deterministic 
(the number of solutions is between and the list-length of input TEf f plus one). 

3.3 Abstract Execution and Annotated Procedures 

Abstract execution is performed on normalised procedures (see Section 2.1), 
because it simplifies the design of abstract operations. The analyser then first 
translates a general Prolog procedure into an equivalent normalised procedure. 
The analysis is compositional. The system verifies a procedure against a speci- 
fication, by assuming that the specifications hold for subproblems. For a given 
program, it analyses each procedure; for a given procedure, it analyses each 
clause; for a given clause, it analyses each atom. If an atom in the body of a 
clause is a procedure call, the analyser looks at the given specifications to infer 
information about its execution. The analyser succeeds if, for each procedure 
and each specification describing this procedure, the analysis of the procedure 
yields results that are covered by the considered specification. 

As a result of the analysis, the procedure is annotated with abstract sequences 
at each program point. The annotation of the clause c ::= p(Xi, X n ) : -l\, l s . 
in the context of an abstract sequence B p = {Pf n , ...) is of the form: 



in(X:gr,T:list(gr) ,TEff :var) 
out(_, _, list(gr)) 
sreKTEf f _out = T_in-1) 
soKsol =< 1) 
sexpr (T) 



in (X : gr , T : any , TEf f : list (gr) ) 
out(_, list(gr), _) 
srel(TEff_in = T_out-l) 
soKsol =< TEff_in+l) 
sexpr (TEf f) 



(0? a ) P {X u ...,X n ):-(B o )l 1 ,(B 1 )...MBaMB c ) 



The analyser certifies that every abstract sequence at a program point safely 
approximates the sequence of answer substitutions computed until that point. 
Let 9 be an input program substitution in Cc(/3f„). For each program point i 
(0 < i < s), the concrete execution of is approximated by Bi, i.e., 

(6,(h, ...,li),c) ^>p Si implies {9, Si) G Cc(Bi). Similarly, the whole clause ex- 
ecution is approximated by B c , i.e., (9, c) S implies (6,S) G Cc(B c ). The 
annotation of a procedure in the context of an abstract sequence B p is the se- 
quence of its annotated clauses. 

3.4 Checking Semantic Properties 

This section explains how the analyser can check the semantic properties of Sec- 
tion 2.3 that arc useful for applying the code transformations. The components 
of the abstract sequences provide constraints about the length of the computed 
answer substitutions. 

Let B = (0in,0ref,(3fails, U, (3 0ut , E re j _ out , E so i) be the abstract sequence ap- 
proximating the execution of a procedure, or of a clause, or of a prefix in a clause, 
or a literal / after the execution of a goal. Let S 2 = (/3 in , (3 l ref , (3j aih , U l , f3 l out , 
E* e j:_ out , El ol ) be the abstract sequence modelling the execution of the clause Ci, 
or the execution of some prefix of the clause c, (1 < i < 2), where c\ and c 2 have 
the same name and arity. 

- If deterministic(B) returns true then (9,S) G Cc(B) implies |5| < 1. The 
value of deterministic(_B) is set to true if sol=<l is a solution of E so i. 

- If fully_deterministic(i?) returns true then (9, S) G Cc(B) implies \S\ = 1. 
The value of fully _deterministic(£?) is set to true if [3i n — (3 r ef, and 
P fails = 0, and sol=l is a solution of E so i. 

- If test_literal(i?) returns true then (9, S) £ Cc(B) implies S — <> or 
S = < 9 >. The value of test_literal(£?) is set to true if sol=<l is a solution 
of E so i and if the untouched component U contains all the indices of (3 re f ■ 

- Let (0,Si) e Cc(Bi) and (9,S 2 ) G Cc(B 2 ). If exclusive^, B 2 ) returns 
true then Si = <> or S 2 = <>. The value of exclusive (Bi, B 2 ) is set to 
true if one of the three following conditions holds: 

• the refined components l3\ e j and (3f e j are incompatible: (3], e j l~l (3f e j = _L 
. or 3(3} e (3} alls : n /? r 2 e/ ) < /?} 
. or 3P}eP% Us :(Pl ef n/3l f )<Pj 

The accuracy and the cooperation between the abstract domains allow the 
analyser to detect automatically whether the conditions are satisfied for applying 
code transformations. The next section discusses the transformation strategy 
realized by the optimiser. 

4 A Strategy to Generate Specialised Code 

In the derivation of a procedure, there is often more than one possible sequence 
of transformations, resulting in different procedures. Some heuristics must then 
be included whithin the automation of the derivation of procedures to find per- 
mutations leading to efficient procedures for a given specification. For instance, 
the following heuristics are suggested in [6]: 



— Choose tail recursive permutations. Last call optimisation is implemented in 
the WAM [14], such that the environment is deallocated before executing the 
last call. The efficiency gain is substantial if the last literal is a recursive call. 

— Choose permutations of the literals with the longest deterministic prefix. This 
choice prevents useless computed answer substitutions by prefixes of the 
literals. Multiple answer substitutions are only generated by the suffixes. 

— Choose permutations of the literals that support the introduction of cuts and 
the removal of useless literals. 

— Choose permutations of the literals such that the unifications are at the be- 
ginning. This is useful to instantiate the clause heads and to suppress these 
equality literals. 

The combination of transformation rules performed by the optimiser is guided 
by the above heuristics, and is illustrated on the example efface, whose initial 
multidirectional source code is: 

efface(X, [HIT] , [HlTEff]) :- eff ace(X,T,TEff ) , not(X=H). 

efface (X, [X|T] ,T) . 

The following directionality is considered (expressed into a formal specification) : 
efface 

in(X:gr, T:list(gr), TEff:any) 
soKsol =< 1) 

STEP A: Syntactic normalisation. Most code transformations apply on 
normalised programs, and the abstract execution itself is defined on normalised 
programs, because it facilitates the analysis. Thus, the first step of the optimiser 
consists of translating the original procedure into an equivalent normalised code: 

efface(Xl,X2,X3) :- X2=[X4|X5] , X3=[X4|X6], efface(Xl,X5,X6) , not (X1=X4) . 
efface(Xl,X2,X3) : - X2= [XI I X3] . 

STEP B: Code annotation. In this step, the information needed to apply 
the code transformations is captured by the checker. Every clause of the nor- 
malised procedure is annotated with abstract sequences at each program point. 

STEP C: Clause reordering. If the order of solutions is not modified, 
some clause reordering is achieved (Rule 1): a clause containing a literal that is 
a good candidate to be removed after the possible introduction of cuts, is placed 
at the bottom of the procedure. In the example efface and for the considered 
directionality, the following code is generated (clause reordering can be realized 
because the procedure is deterministic): 

efface(Xl,X2,X3) : - X2= [XI I X3] . 

efface(Xl,X2,X3) : - X2= [X4 1 X5] , X3=[X4|X6], eff ace(Xl,X5,X6) , not(Xl=X4). 

STEP D: Semantic normalisation. In order to insert a cut at the leftmost 
position in a clause, it may be useful to decompose a unification that may fail 
into equivalent but simpler unifications. It may then happen that a cut will be 
placed between such unifications, instead of after the (unique) global unification. 
This step is called semantic normalisation to distinguish it with the syntactic 
normalisation performed at the step A: the optimiser uses the semantic infor- 
mation available at each program point about the structure, the mode, the type, 



the sharing and the linearity of terms, as well as the sure success of the execution 
of a unification. The following code is generated for efface (the last clause is 
not normalised semantically because no cut will be inserted there): 

efface(Xl,X2,X3) : - X2= [X4 I X5] , X4=X1, X5=X3. 

efface(Xl,X2,X3) : - X2= [X4 I X5] , X3=[X4|X6], ef f ace (XI ,X5 ,X6) , not(Xl=X4). 

In the first clause, the unification X2=[X1|X3] has been decomposed in three 
elementary unifications: the initial unification succeeds if X2 is a non-empty list 
(i.e., X2=[X4|X5]), and if the first clement of X2 is XI (i.e., X4=X1), and if the 
tail of X2 is X3 (i.e., X5=X3). 

STEP E: Leftmost cut insertion with dead code elimination. The 
objective of this step is to insert leftmost green cuts in every clause, except in 
the last clause (Rule 2). In our example, a green cut is introduced in the first 
clause, at the program point where it is exclusive with the second clause: 

efface(Xl,X2,X3) : - X2= [X4 1 X5] , X4=X1, !, X5=X3. 

efface(Xl,X2,X3) : - X2= [X4 I X5] , X3=[X4|X6], efface(Xl,X5,X6) , not(Xl=X4). 

The cut has been inserted before the last unification X5=X3. If we had not per- 
formed the previous step, then the cut would have been placed at the end of the 
first clause. When inserting a cut, some successive clauses may become useless, 
such that they can be removed (Rule 3). 

STEP F: Move cuts backwards. The objective of this step is to obtain 
the longest deterministic prefix before executing a cut in a clause. While making 
sure that the procedure still terminates and that the subcalls are still correctly 
moded and typed, some literals are reordered inside the clauses. An inserted 
cut is moved backwards by passing literals that surely succeed before the cut 
(Rule 4). In our example, the cut cannot be moved backwards, because the 
unification X5=X3 does not surely succeed (at that point, X5 is a ground list 
and X3 is any term). 

STEP G: Removing useless literals. The analyser is able to capture the 
input conditions for which a cut is surely executed. This information is used to 
refine the input conditions of the successive clauses. This allows the optimiser 
to remove some literals which become useless (Rule 5) (e.g., negation, test 
predicates, arithmetic built-ins). In our example, the cut is surely executed when 
the first element of input list X2 is the input XI. The second clause is thus surely 
not executed for that input. In particular, the negation surely succeeds and can 
therefore be suppressed safely: 

efface(Xl,X2,X3) : - X2= [X4 1 X5] , X4=X1, !, X5=X3. 
efface(Xl,X2,X3) : - X2= [X4 1 X5] , X3=[X4|X6], eff ace(Xl,X5,X6) . 

STEP H: Semantic denormalisation. The code generated until this step 
is still normalised (with possibly added cuts). Thus, it remains inefficient. The 
last step consists of applying the reverse transformation. The semantic denor- 
malisation uses the information captured by the analyser to suppress explicit 
unifications, or to replace them by simpler ones, and to place them implicitly in 
the head of clauses. The specialised code for efface is thus finally generated: 

efface(Xl, [X1IX2] ,X3) :-!, X2=X3. 

efface(Xl, [X4|X2] , [X4|X3]) :- ef f ace (XI ,X2,X3) . 



Remark. It may happen that the initial source code already contains some 
cuts. In such situation, the optimiser first removes every green cut, such that 
only the necessary leftmost cuts will be introduced during the step E. 

5 Experimental Evaluation 

Table 2 compares the execution between the original and the generated versions 
of ef f ace(X,T,TEf f ) presented in the previous section. Several tests have been 
performed by considering executions that succeed and that fail, and by varying 
the list-length of the input list T. The table shows that the multidirectional code 
is less efficient than the specialised one in terms of execution time and of used 
local stack. In particular, the specialised code uses a constant amount of local 
stack (independently of the size of the input), while we yield a local stack error 
if we try to execute the multidirectional code with an input list of size 25000. 
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Table 2. Program efface(X,T,TEFf ) executed 1000 times, when X is a ground 
term, T is a ground list, and TEff is any term. Several tests are performed, depend- 
ing on whether the execution fails or succeeds, and depending on the list-length of T. 
The program has been tested on a 1.5 Ghz Pentium; 1Gb RAM; Linux Suse; SWI- 
Prolog v 5.4.6 [15]. The size to which the local stack is allowed to grow is 2048000 By. 

The optimiser has been tested on some classical programs, borrowed 
from [2, 6, 13] and from the Internet. The source programs, the formal speci- 
fications, the generated specialised codes, and the efficiency tests are available at 
http://www.info.ucl.ac.be/~gobert. The tests have been realized on a 1.5 Ghz Pen- 
tium, 1Gb RAM, Linux Suse, with SWI-Prolog v 5.4.6 [15]. 

Tests on Execution Time. Table 3 reports on execution time speedup, 
defined as the ratio between the execution time spent for the source program 
and for the specialised program. A speedup greater than (resp. less than) one 
means that the specialised code is more (resp. less) efficient than the source 
code. We consider 59 procedures and 112 specifications (some procedures have 
several specifications, because they are multidirectional). The benchmark all is 
composed of 173 efficiency tests (there is at least one efficiency test for each spec- 
ification of a procedure) . The benchmark det is a subset of the benchmark all. 
It contains only the efficiency tests for the deterministic procedures. The bench- 
mark ss contains the efficiency tests for the procedures that surely succeed. The 
benchmark det+ss contains the efficiency tests for the determinitic procedures 



that surely succeed. The mean speedup ranges from 1.42 to 1.68. The maximal 
speedup is 8.54 and the minimum speedup is 0.59. 
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45 ss 
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Table 3. Execution time speedup between source codes and specialized codes generated 
by the optimiser: 173 efficiency tests distributed out of 59 predicates and 112 formal 
specifications. From the 173 tests, we obtain a speed up for 112 tests, and a speed down 
for 61 tests. 

Tests on Space Utilisation. Table 4 reports on local stack utilisation. 57 
generated procedures are considered. The maximal amount of local stack used 
during the execution of the generated code is either reduced (for 28 procedures), 
or identical (17 procedures), or increased (12 procedures) w.r.t. the maximal 
amount of local stack used during the execution of the source code. 
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Table 4. Comparisons between source & generated codes in terms of space utilization. 

Accuracy of the Tests. The results reported on Table 3 and Table 4 de- 
pend on the choice of the efficiency tests. It is impossible to perform tests that 
include every situation in the context of a directionality. We have tried to make 
sufficiently general tests for each directionality. The efficiency results depend 
also on the way the source code is written. For all the benchmark reported in 
the tables, the original program was written in the usual Prolog style (i.e., not 
normalised). The optimiser can take as input programs that are normalised, like 
the ones that are derived in the methodology [6]. If we perform the same effi- 
ciency tests with normalised programs as initial source code, then we obtain a 
mean speedup of 3. This shows the utility of the optimiser. 

Generated code may sometimes be less efficient. In general, the op- 
timiser generates code that is more efficient than the source code, in terms of 
execution time and of local stack utilisation. But the tables report that a gen- 
erated code can sometimes be less efficient than the original code. For instance, 
this occurs with the append (LI, L2,L3) procedure which concatenates two lists 
(we consider the usual directionality where inputs LI and L2 are ground lists 
and L3 is a variable). The initial source code is: 
append ( [] ,L,L) . 

append([H|Ll] ,L2, [H|L3]) :- append (L1,L2,L3) . 

In SWI-Prolog, indexing is enabled by default on the first argument. Thus, in the 
context of the considered directionality, no choice point is created on the local 
stack. Indeed, the right clause is directly selected according to the principal 



functor of the first input list, which is either an empty or a non-empty list. 
Furthermore, the second clause is a chain rule (i.e., there is only one atom in the 
body), such that no environment frame is allocated on the local stack. Therefore, 
this source code uses a constant amount of local stack, and executes very quickly. 

The optimiser does not take into account the indexing technique in its strat- 
egy for applying correct code transformations, and the following code will be 
^ 6n6r3jtcd ■ 

append([H|Ll] ,L2, [H|L3]) :- append (LI ,L2,L3) , !. 
append(_,L,L) . 

where the two clauses are reordered, a cut is inserted after the recursive call, and 
the constant [] in the second clause is removed. This code is less efficient than 
the initial source code. Indexing has no effect because the right clause cannot 
be selected according to the principal functor of the input list LI. Thus, Prolog 
creates a choice point on the local stack each time the first clause is called (and 
it is often executed because it is the recursive call). The cut is a deep cut (i.e., it 
is not located just after the symbol :-). It occurs after the recursive call, such 
that the first clause is no more a chain rule, and an environment frame must be 
allocated on the local stack each time we execute the first clause. The amount 
of local stack used during execution is increasing through recursive calls, to the 
contrary of the initial source code, which uses a constant amount of local stack. 

6 Related Work 

The Mercury programming language [12] is associated with a whole range of 
analysis tools for optimisation purposes. In Mercury also, the programmer has 
to annotate the program with information about modes, types, success and de- 
terminacy. A main difference is that not all logic programs are accepted by 
Mercury (only limited forms of unification are allowed). There are restrictions 
on the form of the programs and queries in order to generate more efficient code. 
Mercury is not based on the Warren's Abstract Machine, but has specialised - 
more efficient - algorithms, depending on the determinacy information. So our 
approach is more appealing to programmers who arc willing to keep the full 
power of Prolog. Unlike Mercury, we do not change the usual execution model of 
Prolog based on the WAM [1, 14]: our optimiser performs some transformations 
at the Prolog code level. Actually, our optimiser should be able to generate low- 
level code. But performing source-to-source transformations is more portable 
(and easier to explain) than generating specialised WAM's instructions. Most 
Mercury annotations can be translated into our formal specification language. 
A current limitation to our language is that, for instance, we cannot express 
that an input list contains only free distinct variables (this can be expressed in 
Mercury). On the other hand, more general directionalities can be described in 
our language. For instance, we can express that some argument is a list possi- 
bly non-instantiated and possibly non-linear, and that two terms possibly share 
a variable (this cannot be expressed in Mercury), like in the following formal 
specification for append: 

append 

in(Ll : list (any) , L2 : list (any) , L3:any) 
out(_, _, list (any)) 
soKsol =< 1) 



The Ciao preprocessor CiaoPP [8] is a powerful static analyser based on ab- 
stract interpretation, which features many analyses similar to ours and other 
ones. The system can infer and/or check properties like regular types, modes, 
sharing, non-failure and determinacy, bounds on computational cost, bounds on 
sizes of terms in the program, and termination. In that system, procedures can 
be optionally annotated by assertions, which partially corresponds to specifica- 
tions of our system. The system can perform automatic optimisations such as 
source-to-source transformation, specialisation, partial evaluation of programs, 
program parallelisation. Some transformations like cut insertion and semantic 
denormalisation are not performed in CiaoPP. 

The authors of [5] consider how most of the common uses of cut can be 
eliminated from Prolog source programs, by relying on static analysis to generate 
them at compile time. Static analysis techniques are used to detect situations 
where to place cuts. In our approach, the insertion of cuts is only one part of 
the optimisation process: several source-to-source transformations are applied 
to find the best place where to place the cut (e.g., clause and literal reordering, 
semantic normalisation, etc.), to remove literals becoming useless due to the 
execution of cut in previous clauses, and to perform some partial evaluation. 

7 Conclusion and Future Work 

We have presented an optimiser based on abstract interpretation which attempts 
to make a Prolog program more efficient by transforming its source code without 
changing its operational semantics. The tool automatically performs safe code 
transformations of any Prolog program in the context of some class of input 
calls (described in formal specifications). Preliminary experimental tests of the 
optimiser show encouraging results, since the specialised codes are, at the av- 
erage, 1.42 time more efficient in terms of execution time consumption and of 
local stack utilisation. The optimiser can be improved: we plan to find better 
heuristics for specialising the code, based on the knowledge of the WAM [1, 14], 
in order to take into account the indexing technique, and the influence on the 
efficiency of adding or not deep cuts. 
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